#name:Jinying Wei #SID:918838720

1.” Exploring neural decision-making in response to visual stimuli: an analysis based on Steinmetz et al. (2019)

Abstract: In this analysis, I have performed an exploratory analysis of the data. I plotted ggplot to show how the data differed in various aspects as a way to learn more about the dataset. Then, I performed data integration while exploring homogeneity and heterogeneity of mice. To assess the homogeneity of feedback variances, I performed ANOVA tests and Tukey tests. The results showed significant differences in the “mouse” and “session” variables between the two groups, highlighting differences in feedback patterns. I then developed logistic regression models based on various predictor variables to predict feedback. The model was iteratively improved by removing statistically insignificant variables. The final model showed improved predictive performance. Overall, my analysis involved data integration, dimensionality reduction via PCA, ANOVA, and Tukey’s test, and logistic regression modeling. The significance of these steps is to explore patterns, identify important variables, and improve predictive performance.

Section 1: Introduction

In this project, I will analyze the complexity of the neural decision-making process in mice. I’m going to extract the data and build a model to infer whether our conjecture is related to the data. Keep in mind that these experiments were revealed through a series of visual stimulation experiments. In this project, I will draw on the detailed data collected by Steinmetz et al. (2019), I focused on understanding how neurons respond to visual contrast at different levels. These experiments were initially conducted on 10 mice over 39 sessions and involved the presentation of randomly varying visual stimuli, based on which the mice made decisions. Our analysis focused in particular on the sequence of neuronal spines from the start of stimulation to 0.4 seconds after initiation, covering 18 selected stages in four mice: Corey, Frossman, Therefore, and Lederberg. As we look at this wealth of data, we hope to understand how neural activity underpins decision-making in these organisms. Focusing on this research can continue to influence different scientific directions in the real world. For example, the findings could help scientists continue to study neuroscience to understand the complexity of the brain and neural activity. How the brain reacts when making a decision.

Section 2: Exploratory analysis And Data Integration

library(data.table)
library(RColorBrewer)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between()     masks data.table::between()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ dplyr::first()       masks data.table::first()
## ✖ lubridate::hour()    masks data.table::hour()
## ✖ lubridate::isoweek() masks data.table::isoweek()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ dplyr::last()        masks data.table::last()
## ✖ lubridate::mday()    masks data.table::mday()
## ✖ lubridate::minute()  masks data.table::minute()
## ✖ lubridate::month()   masks data.table::month()
## ✖ lubridate::quarter() masks data.table::quarter()
## ✖ lubridate::second()  masks data.table::second()
## ✖ purrr::transpose()   masks data.table::transpose()
## ✖ lubridate::wday()    masks data.table::wday()
## ✖ lubridate::week()    masks data.table::week()
## ✖ lubridate::yday()    masks data.table::yday()
## ✖ lubridate::year()    masks data.table::year()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(cowplot)
## 
## Attaching package: 'cowplot'
## 
## The following object is masked from 'package:lubridate':
## 
##     stamp
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some
setwd("/Users/yoursflo/Downloads/sessions")
session=list()
for(i in 1:18){
  session[[i]]=readRDS(paste('session',i,'.rds',sep=''))
  # print(session[[i]]$mouse_name)
  # print(session[[i]]$date_exp)
}
for(i in 1:18){
  print(paste("Session", i))
  print(paste("Unique brain areas:", length(unique(session[[i]]$brain_area))))
  print(table(session[[i]]$feedback_type))
  print(paste("Missing spks:", sum(is.na(unlist(session[[i]]$spks)))))
  print(paste("Missing feedback types:", sum(is.na(session[[i]]$feedback_type))))
  print("---")
}
## [1] "Session 1"
## [1] "Unique brain areas: 8"
## 
## -1  1 
## 45 69 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 2"
## [1] "Unique brain areas: 5"
## 
##  -1   1 
##  92 159 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 3"
## [1] "Unique brain areas: 11"
## 
##  -1   1 
##  77 151 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 4"
## [1] "Unique brain areas: 11"
## 
##  -1   1 
##  83 166 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 5"
## [1] "Unique brain areas: 10"
## 
##  -1   1 
##  86 168 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 6"
## [1] "Unique brain areas: 5"
## 
##  -1   1 
##  75 215 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 7"
## [1] "Unique brain areas: 8"
## 
##  -1   1 
##  83 169 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 8"
## [1] "Unique brain areas: 15"
## 
##  -1   1 
##  89 161 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 9"
## [1] "Unique brain areas: 12"
## 
##  -1   1 
## 117 255 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 10"
## [1] "Unique brain areas: 13"
## 
##  -1   1 
## 170 277 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 11"
## [1] "Unique brain areas: 6"
## 
##  -1   1 
##  70 272 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 12"
## [1] "Unique brain areas: 12"
## 
##  -1   1 
##  89 251 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 13"
## [1] "Unique brain areas: 15"
## 
##  -1   1 
##  61 239 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 14"
## [1] "Unique brain areas: 10"
## 
##  -1   1 
##  82 186 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 15"
## [1] "Unique brain areas: 8"
## 
##  -1   1 
##  95 309 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 16"
## [1] "Unique brain areas: 6"
## 
##  -1   1 
##  79 201 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 17"
## [1] "Unique brain areas: 6"
## 
##  -1   1 
##  38 186 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
## [1] "Session 18"
## [1] "Unique brain areas: 10"
## 
##  -1   1 
##  42 174 
## [1] "Missing spks: 0"
## [1] "Missing feedback types: 0"
## [1] "---"
i_session = 3
i_trial = 12
spikes = session[[i_session]]$spks[[i_trial]]
total_spikes = rowSums(spikes)
ggplot(data.frame(neuron = 1:length(total_spikes), total_spikes = total_spikes), aes(x = neuron, y = total_spikes)) +
  geom_bar(stat = "identity") +
  labs(x = "Neuron", y = "Total number of spikes", title = paste("Total number of spikes per neuron in trial", i_trial, "of session", i_session))

avg_spikes = colMeans(spikes)

feedback_type = session[[i_session]]$feedback_type[i_trial]

avg_spikes_per_trial <- sapply(session[[i_session]]$spks, function(x) mean(rowSums(x)))

cor.test(avg_spikes_per_trial, session[[i_session]]$feedback_type)
## 
##  Pearson's product-moment correlation
## 
## data:  avg_spikes_per_trial and session[[i_session]]$feedback_type
## t = 4.3608, df = 226, p-value = 1.969e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1542517 0.3942492
## sample estimates:
##      cor 
## 0.278594
num_neurons <- sapply(session, function(x) max(sapply(x$spks, nrow)))
print(paste("Number of neurons per session: ", num_neurons))
##  [1] "Number of neurons per session:  734" 
##  [2] "Number of neurons per session:  1070"
##  [3] "Number of neurons per session:  619" 
##  [4] "Number of neurons per session:  1769"
##  [5] "Number of neurons per session:  1077"
##  [6] "Number of neurons per session:  1169"
##  [7] "Number of neurons per session:  584" 
##  [8] "Number of neurons per session:  1157"
##  [9] "Number of neurons per session:  788" 
## [10] "Number of neurons per session:  1172"
## [11] "Number of neurons per session:  857" 
## [12] "Number of neurons per session:  698" 
## [13] "Number of neurons per session:  983" 
## [14] "Number of neurons per session:  756" 
## [15] "Number of neurons per session:  743" 
## [16] "Number of neurons per session:  474" 
## [17] "Number of neurons per session:  565" 
## [18] "Number of neurons per session:  1090"
num_trials <- sapply(session, function(x) length(x$spks))
print(paste("Number of trials per session: ", num_trials))
##  [1] "Number of trials per session:  114" "Number of trials per session:  251"
##  [3] "Number of trials per session:  228" "Number of trials per session:  249"
##  [5] "Number of trials per session:  254" "Number of trials per session:  290"
##  [7] "Number of trials per session:  252" "Number of trials per session:  250"
##  [9] "Number of trials per session:  372" "Number of trials per session:  447"
## [11] "Number of trials per session:  342" "Number of trials per session:  340"
## [13] "Number of trials per session:  300" "Number of trials per session:  268"
## [15] "Number of trials per session:  404" "Number of trials per session:  280"
## [17] "Number of trials per session:  224" "Number of trials per session:  216"
stimuli_conditions <- lapply(session, function(x) table(paste(x$contrast_left, x$contrast_right)))
print("Stimuli conditions per session: ")
## [1] "Stimuli conditions per session: "
print(stimuli_conditions)
## [[1]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        24         4        14         9         2         1         4        20 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##         8         4         3         3         9         5         2         2 
## 
## [[2]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        66        10        15        42         3         7         6         9 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        24         5         7         3        22        19         6         7 
## 
## [[3]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        76         6        21        34         4         9         3        15 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        17         1         5         5        12        10         2         8 
## 
## [[4]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        68        18        12        14        12         4        12        13 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        14        16         5        11        13        17        12         8 
## 
## [[5]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        71         9        13        19        10         7        13        16 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        10        11         9        13        14        21        10         8 
## 
## [[6]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        79        13        14        16        18         5        17        12 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##         9        17         9        18        19        24        13         7 
## 
## [[7]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        89         9        12        10         7         2        14        23 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        17        14         5        11         6        13        14         6 
## 
## [[8]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        41         7        15        31        10         4         5        21 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        18         2         6         5        33        34        12         6 
## 
## [[9]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        76        20        22        35        14         9        11        11 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        51         5         6         6        52        40         9         5 
## 
## [[10]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##       113        21        22        37        21        12        13        14 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        36        12        11        15        75        28         7        10 
## 
## [[11]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        82        17        30        30        16         5         7        22 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        38         9         9         6        37        16         7        11 
## 
## [[12]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##       108        11        19        35        12         6        13        24 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        26        13         7         9        21        20        11         5 
## 
## [[13]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        82        12        18        23         8         6        12        17 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        27        11         5         9        21        35        10         4 
## 
## [[14]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        85         4        24        17         7         3        13        15 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        17         9         5        10        19        28         8         4 
## 
## [[15]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##       117        11        27        36        11         4         9        32 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        31        15         7        15        30        37        16         6 
## 
## [[16]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        78         8        20        18         9         6         6        21 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        22         8         4         9        18        42         8         3 
## 
## [[17]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        53         6        14        30         7         4        11        19 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        15         8         5         7        15        21         6         3 
## 
## [[18]]
## 
##       0 0    0 0.25     0 0.5       0 1    0.25 0 0.25 0.25  0.25 0.5    0.25 1 
##        63         8        14        18         8         5        10        13 
##     0.5 0  0.5 0.25   0.5 0.5     0.5 1       1 0    1 0.25     1 0.5       1 1 
##        17         6         3         8        22        13         6         2
feedback_types <- lapply(session, function(x) table(x$feedback_type))
print("Feedback types per session: ")
## [1] "Feedback types per session: "
print(feedback_types)
## [[1]]
## 
## -1  1 
## 45 69 
## 
## [[2]]
## 
##  -1   1 
##  92 159 
## 
## [[3]]
## 
##  -1   1 
##  77 151 
## 
## [[4]]
## 
##  -1   1 
##  83 166 
## 
## [[5]]
## 
##  -1   1 
##  86 168 
## 
## [[6]]
## 
##  -1   1 
##  75 215 
## 
## [[7]]
## 
##  -1   1 
##  83 169 
## 
## [[8]]
## 
##  -1   1 
##  89 161 
## 
## [[9]]
## 
##  -1   1 
## 117 255 
## 
## [[10]]
## 
##  -1   1 
## 170 277 
## 
## [[11]]
## 
##  -1   1 
##  70 272 
## 
## [[12]]
## 
##  -1   1 
##  89 251 
## 
## [[13]]
## 
##  -1   1 
##  61 239 
## 
## [[14]]
## 
##  -1   1 
##  82 186 
## 
## [[15]]
## 
##  -1   1 
##  95 309 
## 
## [[16]]
## 
##  -1   1 
##  79 201 
## 
## [[17]]
## 
##  -1   1 
##  38 186 
## 
## [[18]]
## 
##  -1   1 
##  42 174
summary_data <- data.frame(
  Session = 1:18,
  Neurons = num_neurons,
  Trials = num_trials,
  Stimuli_Conditions = sapply(stimuli_conditions, function(x) paste(names(x), collapse = ", ")),
  Feedback_Types = sapply(feedback_types, function(x) paste(names(x), collapse = ", "))
)

print(summary_data)
##    Session Neurons Trials
## 1        1     734    114
## 2        2    1070    251
## 3        3     619    228
## 4        4    1769    249
## 5        5    1077    254
## 6        6    1169    290
## 7        7     584    252
## 8        8    1157    250
## 9        9     788    372
## 10      10    1172    447
## 11      11     857    342
## 12      12     698    340
## 13      13     983    300
## 14      14     756    268
## 15      15     743    404
## 16      16     474    280
## 17      17     565    224
## 18      18    1090    216
##                                                                                                        Stimuli_Conditions
## 1  0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 2  0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 3  0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 4  0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 5  0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 6  0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 7  0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 8  0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 9  0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 10 0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 11 0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 12 0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 13 0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 14 0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 15 0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 16 0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 17 0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
## 18 0 0, 0 0.25, 0 0.5, 0 1, 0.25 0, 0.25 0.25, 0.25 0.5, 0.25 1, 0.5 0, 0.5 0.25, 0.5 0.5, 0.5 1, 1 0, 1 0.25, 1 0.5, 1 1
##    Feedback_Types
## 1           -1, 1
## 2           -1, 1
## 3           -1, 1
## 4           -1, 1
## 5           -1, 1
## 6           -1, 1
## 7           -1, 1
## 8           -1, 1
## 9           -1, 1
## 10          -1, 1
## 11          -1, 1
## 12          -1, 1
## 13          -1, 1
## 14          -1, 1
## 15          -1, 1
## 16          -1, 1
## 17          -1, 1
## 18          -1, 1

Analysis: First, I summarize the successes and failures of each session. Of these 18 meetings, there were more successes (feedback type ‘1’) than failures (feedback type ‘-1’). For example, for meeting 15, for session 15, 95 trials had feedback type -1 (failed) and 309 trials had feedback type 1 (successful).

gg-plot: According to the figure, there is a neuron in the 400-600 range (on the x axis) that fires about 50 times (on the y axis) during trial 12 in section 3. That could mean it’s an outlier.

# session
V=n.session=length(session)

meta <- tibble(
  mouse_name = rep('name',n.session),
  date_exp =rep('dt',n.session),
  n_brain_area = rep(0,n.session),
  n_neurons = rep(0,n.session),
  n_trials = rep(0,n.session),
  success_rate = rep(0,n.session)
)

for(i in 1:n.session){
  tmp = session[[i]];
  meta[i,1]=tmp$mouse_name;
  meta[i,2]=tmp$date_exp;
  meta[i,3]=length(unique(tmp$brain_area));
  meta[i,4]=dim(tmp$spks[[1]])[1];
  meta[i,5]=length(tmp$feedback_type);
  meta[i,6]=mean(tmp$feedback_type+1)/2;
}
summary(meta)
##   mouse_name          date_exp          n_brain_area     n_neurons     
##  Length:18          Length:18          Min.   : 5.00   Min.   : 474.0  
##  Class :character   Class :character   1st Qu.: 6.50   1st Qu.: 707.0  
##  Mode  :character   Mode  :character   Median :10.00   Median : 822.5  
##                                        Mean   : 9.50   Mean   : 905.8  
##                                        3rd Qu.:11.75   3rd Qu.:1086.8  
##                                        Max.   :15.00   Max.   :1769.0  
##     n_trials      success_rate   
##  Min.   :114.0   Min.   :0.6053  
##  1st Qu.:249.2   1st Qu.:0.6616  
##  Median :261.0   Median :0.6898  
##  Mean   :282.3   Mean   :0.7074  
##  3rd Qu.:330.0   3rd Qu.:0.7590  
##  Max.   :447.0   Max.   :0.8304

summary:

n_brain_area: Represents the number of unique brain regions recorded in each session. The minimum number of brain regions recorded in a single session was 5 and the maximum was 15. The median number of brain regions recorded across sessions was 10, with an average of about 9.5 brain regions recorded per session.

n_ Neurons: This represents the number of neurons recorded in the first trial of each session. The sessions recorded a minimum of 474 neurons and a maximum of 1,769 neurons. The median number of neurons recorded across sessions was 822.5, with an average of about 906 neurons recorded per session.

n_trials: This tells you the number of trials conducted per session. At least 114 trials were held and a maximum of 447 trials were held at these sessions. The median number of trials conducted across the course was 261, with an average of about 282 trials per course.

success_rate: This indicates the percentage of successful trials per course of treatment. The lowest success rate of all courses was about 60.53%, and the highest success rate was 83.04%. The median success rate was about 68.98% and the average success rate was about 70.74%.

meta$session<-paste0("session",1:18)
count<-data.table()
count$sessionname<-rep(paste0("session",1:18),2)
count$type<-c(rep("neurons",18),rep("trials",18))
count$count<-999
for (i in 1:18) {
  count$count[i]<-length(session[[i]]$brain_area)
  count$count[i+18]<-length(session[[i]]$feedback_type)
}
count<-as.data.frame(count)

p1 <- ggplot2::ggplot(count,
                      aes(x=sessionname,y=count,group=type,fill=type))+geom_bar(stat = "identity",position="dodge")+
  scale_fill_manual(values = c("#8c510a", "#f6e8c3", 
                                        "#c7eae5", "#5ab4ac", "#01665e", "#af8dc3"))+theme_light()+theme(axis.text.x = element_text(angle = 30,hjust = 1))
p1

count1<-data.table()
for (i in 1:18) {
  count<-data.table()
  count$type<-c(names(table(session[[i]]$contrast_right)),names(table(session[[i]]$contrast_left)))
  count$count<-c(table(session[[i]]$contrast_right),table(session[[i]]$contrast_left))
  count$sessionname<-paste0("session",i)
  count$posi<-c(rep("contrast_right",4),rep("contrast_left",4))
  count1<-rbind(count1,count)
}
count1<-as.data.frame(count1)

p2<-ggplot2::ggplot(count1[count1$posi=="contrast_left",],
                aes(x=sessionname,y=count,group=type,fill=type))+geom_bar(stat = "identity")+
  scale_fill_manual(values = c("#c7eae5", "#018852","#5ab4ac", "#01665e"))+theme_light()+theme(axis.text.x = element_text(angle = 30,hjust = 1))

p3<-ggplot2::ggplot(count1[count1$posi=="contrast_right",],
                    aes(x=sessionname,y=count,group=type,fill=type))+geom_bar(stat = "identity")+
  scale_fill_manual(values = c("#c7eae5", "#018852","#5ab4ac", "#01665e"))+theme_light()+theme(axis.text.x = element_text(angle = 30,hjust = 1))


plot_grid(p2, p3,labels = c("contrast_left","contrast_right"), ncol = 2)

Analysis:

For this graph of p1, I find that the value of session4 neurons exceeds 1500. This is bigger than any other meeting. This may indicate more detailed data collected during session 4. I guess it’s probably the fourth session that captures more neuron activity. At the same time, I found that session10 had the most trials of these sessions.

For plot 2 and p3, in contrast left, session10 has more parts of type 1, greater than 100, than in contrast right, less than 100. I suspect that this may be due to a bias in stimulus presentation: the experimental design in Section 10 May have intended to present more high-contrast (grade 1) stimuli to the subject’s left visual field.

Another possibility is to test for hemiplegia: Researchers may be interested in studying hemiplegia in the brain. By presenting more high-contrast stimuli to one side, they could explore how the brain responds differently in the left and right visual fields.

count2<-data.table()
for (i in 1:18) {
  count<-data.table()
  count$type<-names(table(session[[i]]$feedback_type))
  count$count<-table(session[[i]]$feedback_type)
  count$sessionname<-paste0("session",i)
  count2<-rbind(count2,count)
}
count2<-as.data.frame(count2)
p4<-ggplot2::ggplot(count2,
                aes(x=sessionname,y=count,group=type,fill=type))+geom_bar(stat = "identity")+
  scale_fill_manual(values = c("#123294", "#723999"))+theme_light()+theme(axis.text.x = element_text(angle = 30,hjust = 1))
  
plot(p4)
## Don't know how to automatically pick scale for object of type <table>.
## Defaulting to continuous.

meta$success_level <- cut(meta$success_rate, breaks = c(0, 0.65, 0.8, 1), labels = c("low", "medium", "high"), include.lowest = TRUE)

p5 <- ggplot(meta) +
  geom_text(aes(x=session, y=n_trials+8, label=signif(success_rate, 2))) +
  geom_bar(aes(x=session, y=n_trials, fill = success_level), stat = "identity") +
  scale_fill_manual(values = c("low" = "red", "medium" = "yellow", "high" = "green")) +
  theme_light() +
  theme(axis.text.x = element_text(angle = 30,hjust = 1)) +
  ylab("success_rate")

plot(p5)

From plot 4, we can see that meeting 15 has the highest number of successful meetings and meeting 1 has the lowest number of successful meetings. plot 5 I do a more intuitive numerical processing of the overall data, showing the success rate in each session. The meeting with the highest success rate was session 17 - (0.83).

ii. explore the neural activities during each trial.
all_areas = unique(unlist(lapply(session, function(x) unique(x$brain_area))))
print(all_areas)
##  [1] "ACA"   "MOs"   "LS"    "root"  "VISp"  "CA3"   "SUB"   "DG"    "CA1"  
## [10] "VISl"  "VISpm" "POST"  "VISam" "MG"    "SPF"   "LP"    "MRN"   "NB"   
## [19] "LGd"   "TH"    "VPL"   "VISa"  "LSr"   "OLF"   "ORB"   "PL"    "AUD"  
## [28] "SSp"   "LD"    "CP"    "EPd"   "PIR"   "ILA"   "TT"    "PO"    "ORBm" 
## [37] "MB"    "SCm"   "SCsg"  "POL"   "GPe"   "VISrl" "MOp"   "LSc"   "PT"   
## [46] "MD"    "LH"    "ZI"    "SCs"   "RN"    "MS"    "RSP"   "PAG"   "BLA"  
## [55] "VPM"   "SSs"   "RT"    "MEA"   "ACB"   "OT"    "SI"    "SNr"

Analysis: This shows the brain regions of 58 mice included in the dataset. These are abbreviations for different areas of the mouse brain. Here’s a brief description of some of them when I looked them up on Google: “ACA” : anterior cingulate area “MOs” : secondary motor area “LS” : lateral septal nucleus “VISp” : primary visual area “CA3” : Hippocampus CA3 region “Daughter” : the mycelium layer “DG” : dentate gyrus “CA1” : Hippocampus CA1 region “VISl”: side view area “VISpm” : posterior medial visual area

library(tidyverse)
library(cowplot)
library(RColorBrewer)
i.s=2 # indicator for this session
i.t=1 # indicator for this trial 

average_spike_area<-function(i.t,this_session){
  spk.trial = this_session$spks[[i.t]]
  area= this_session$brain_area
  spk.count=apply(spk.trial,1,sum)
  spk.average.tapply=tapply(spk.count, area, mean)
  return(spk.average.tapply)
}

f<-function(i.s){
n.trial=length(session[[i.s]]$feedback_type)
n.area=length(unique(session[[i.s]]$brain_area ))

trial.summary =matrix(nrow=n.trial,ncol= n.area+1+2+1)
for(i.t in 1:n.trial){
  trial.summary[i.t,]=c(average_spike_area(i.t,this_session = session[[i.s]]),
                        session[[i.s]]$feedback_type[i.t],
                        session[[i.s]]$contrast_left[i.t],
                        session[[i.s]]$contrast_right[i.t],
                        i.t)
}

colnames(trial.summary)=c(names(average_spike_area(i.t,this_session = session[[i.s]])), 'feedback', 'left contr.','right contr.','id' )

trial.summary<-as.data.frame(trial.summary)
trial.summary$mouse<-session[[i.s]]$mouse_name
trial.summary$session<-paste0("session",i.s)
trial.summary
}

session_1<-f(1)
# Turning it into a data frame
plot_session<-function(data){
trial.summary <- as_tibble(data)
plot_data<-trial.summary%>%.[,c(1:c(ncol(.)-6),ncol(.)-2)]%>%gather(key="area",value="spk",-"id")

ggplot(plot_data)+
  geom_smooth(aes(x=id,y=spk,group=area,color=area),se=F,method="loess")+
  geom_line(aes(x=id,y=spk,group=area,color=area))+
  scale_color_manual(values = c(brewer.pal(10,"Set1"),brewer.pal(9,"Set2")))+
  theme_light()
}
plot_session(session_1)
## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning in brewer.pal(9, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## `geom_smooth()` using formula = 'y ~ x'

session_2<-f(3)
plot_session(session_2)
## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## `geom_smooth()` using formula = 'y ~ x'

#######trial
trial<-function(s,t){
this_session=session[[s]]
i.t=t
a<-this_session$spks[[i.t]]
a[a>1]<-1
a<-as.data.frame(a)
colnames(a)<-this_session$time[[t]]
a$id<-1:nrow(a)
a$name<-this_session$brain_area
this_session$feedback_type[t]
b<-gather(a,key = "time",value = "spk",-name,-id)
b$time<-as.numeric(b$time)
ggplot(b[b$spk!=0,])+geom_point(aes(x=time,y=id,group=name,color=name))+
scale_color_manual(values = c(brewer.pal(10,"Set1"),brewer.pal(9,"Set2")))+
theme_light()+
  theme(axis.text.x = element_text(angle = 30,hjust = 1))+ggtitle(paste0("feedback_",this_session$feedback_type[t]))+
  scale_x_continuous(limits = c(min(unique(b[b$spk!=0,]$time)),max(unique(b[b$spk!=0,]$time))), 
                     breaks = seq(min(unique(b[b$spk!=0,]$time)),max(unique(b[b$spk!=0,]$time)), 0.09))
}
trial(1,1)
## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

trial(1,2)
## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

trial(1,3)
## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

plot_grid(trial(1,1), trial(1,3), ncol = 2)
## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning in brewer.pal(9, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

spks.trial <- session[[1]]$spks[[1]]

total.spikes <- apply(spks.trial, 1, sum)

avg.spikes <- mean(total.spikes)
cat("Average number of spikes per neuron in Trial 1:", avg.spikes, "\n")
## Average number of spikes per neuron in Trial 1: 1.581744
active.neurons <- sum(total.spikes > 0)
avg.spikes.active <- sum(total.spikes) / active.neurons
cat("Average number of spikes per active neuron in Trial 1:", avg.spikes.active, "\n")
## Average number of spikes per active neuron in Trial 1: 3.806557
#When I looked it up on Wikipedia, I realized that VISp would be important if I wanted to study how mice process visual information. Based on our initial visualization of the code, reduce the variables and only display VISp.

plot_session <- function(data) {
  trial.summary <- as_tibble(data)
  plot_data <- trial.summary %>% 
    .[, c(1:c(ncol(.)-6), ncol(.)-2)] %>%
    gather(key = "area", value = "spk", -"id") %>%
    filter(area == "VISp")  
  
  ggplot(plot_data) +
    geom_smooth(aes(x = id, y = spk, group = area, color = area), se = FALSE, method = "loess") +
    geom_line(aes(x = id, y = spk, group = area, color = area)) +
    scale_color_manual(values = c(brewer.pal(10, "Set1"), brewer.pal(9, "Set2"))) +
    theme_light()
}


session_1 <- f(1)
plot_session(session_1)
## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## `geom_smooth()` using formula = 'y ~ x'

session_2 <- f(3)
plot_session(session_2)
## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Warning in brewer.pal(10, "Set1"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## `geom_smooth()` using formula = 'y ~ x'

  1. 1.581744 is the average spike per neuron in the first 0.4 seconds of Trial 1 in Session 1.

  2. 3.806557 is the average spike per active neuron. Analysis:

  3. The first two plots generated by plot_session (session_1) and plot_session_2 are visualizations of average neural activity in each brain region during the first and third trials, respectively. (4)In the figure in Chapter 1, the X-axis represents the ID of the test session, and the Y-axis shows the average spike count. Each line corresponds to a unique brain region. Thus, these graphs outline how the average neural activity in each area evolved over the course of each course of the trial. We’ve given different colors to eight brain regions. From the graph generated in session1, it is observed that sub (brown line) has the highest average peak, distributed around 2.5. The second image is a visualization of the brain regions of Session 3. There are 11 brain regions in Session three. According to the observation, the orange line is higher for the average peak of VISP. VISP queries from the above data represent the primary visual area in the mouse’s brain.

  4. The last two plots generated by the plot_grid are scatter plots showing the spiking activity of each neuron over time in the first and third trials of the first trial. Each point represents the peak of the neuron, the y axis represents the ID of the neuron, and the x axis represents the time when the peak occurred. The color of the spot corresponds to the brain region of the neuron. From the diagram on the left, we can see that the distribution of different neurons is stacked. The figure on the left shows that the blue dots (CA3) are distributed between ID400 and 600. The red dots (ACA) are distributed on the Y-axis between 0 and 400. This shows the range of id of neurons in different brain regions and the time distribution of their spikes. The graph on the right also shows the distribution of neurons and the distribution of SUB in y ~ (400 ~ 600). By comparing the left and right graphs (representing the first and third trials, respectively), I was able to observe changes in neuronal spike patterns across trials. The stacked distribution along the Y-axis suggests that neurons are grouped by brain regions, rather than mixed. This can provide information about the structured nature of experimental recordings, in which different brain regions are recorded in a certain order.

  5. The two graphs (red) show the average peak activity of VISp in different trials. Because of the visual stimulation provided to the mice, it is necessary to analyze the activity of VISp individually. In the first VISp plot, I observed that the curve fluctuates between the Y-axis (1.5-2) as the id increases. I guess I’m observing a steady trend. This could indicate some kind of fatigue response in the nervous system. The second plot shows the activity of VISp in Meeting 2. In the second chart, VISp shows a downward trend. This may be due to the increase of id, the curve gradually flattens out. I suspect that as id progresses, the mice may be adapting to visual stimuli.

(iii). Exploring Homogeneity and Heterogeneity between Conversations and Mice To explore homogeneity and heterogeneity between different sessions and mice. I want to use the anova function for analysis of variance. To test whether the feedback received from the mice differed significantly from mouse to mouse or from session to session. This allows insight into whether individual mice behave differently and whether there are differences between sessions.
session_all<-list()
for(i in 1:18){
  session_all[[i]]<-assign(paste0("session_",i),f(i))
}
session_all<-Reduce(bind_rows,session_all)
session_all[is.na(session_all)]<-0



a<-leveneTest(feedback~mouse,session_all)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
b<-leveneTest(feedback~session,session_all)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
r<-aov(feedback~mouse,session_all)
summary(r)
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## mouse          3     39  12.983    15.9 2.73e-10 ***
## Residuals   5077   4145   0.816                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
t<-aov(feedback~session,session_all)
summary(t)
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## session       17     86   5.053   6.243 9.66e-15 ***
## Residuals   5063   4098   0.809                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(t,conf.level = 1-0.05) 
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = feedback ~ session, data = session_all)
## 
## $session
##                             diff          lwr          upr     p adj
## session10-session1   0.028847286 -0.300645794  0.358340366 1.0000000
## session11-session1   0.380116959  0.040501073  0.719732845 0.0115487
## session12-session1   0.265944272 -0.073921240  0.605809784 0.3586391
## session13-session1   0.382807018  0.037298966  0.728315069 0.0134066
## session14-session1   0.177533386 -0.173608723  0.528675495 0.9532285
## session15-session1   0.319176655 -0.013860606  0.652213915 0.0785467
## session16-session1   0.225187970 -0.123701389  0.574077329 0.7174533
## session17-session1   0.450187970  0.088900442  0.811475498 0.0018895
## session18-session1   0.400584795  0.037047708  0.764121882 0.0145078
## session2-session1    0.056405955 -0.298267104  0.411079014 1.0000000
## session3-session1    0.114035088 -0.246181957  0.474252132 0.9997815
## session4-session1    0.122807018 -0.232310640  0.477924675 0.9993025
## session5-session1    0.112308330 -0.241709944  0.466326603 0.9997753
## session6-session1    0.272232305 -0.074912214  0.619376824 0.3545582
## session7-session1    0.130743525 -0.223709674  0.485196725 0.9984315
## session8-session1    0.077473684 -0.277420854  0.432368223 0.9999990
## session9-session1    0.160441426 -0.175733481  0.496616333 0.9727422
## session11-session10  0.351269673  0.125667779  0.576871567 0.0000086
## session12-session10  0.237096986  0.011119486  0.463074487 0.0281367
## session13-session10  0.353959732  0.119581665  0.588337798 0.0000210
## session14-session10  0.148686100 -0.093920671  0.391292870 0.7930733
## session15-session10  0.290329369  0.074757845  0.505900892 0.0003796
## session16-session10  0.196340684 -0.042993913  0.435675281 0.2738240
## session17-session10  0.421340684  0.164268822  0.678412546 0.0000017
## session18-session10  0.371737509  0.111513609  0.631961409 0.0000932
## session2-session10   0.027558669 -0.220131143  0.275248481 1.0000000
## session3-session10   0.085187802 -0.170377422  0.340753025 0.9995669
## session4-session10   0.093959732 -0.154366295  0.342285758 0.9978603
## session5-session10   0.083461044 -0.163290253  0.330212341 0.9994760
## session6-session10   0.243385019  0.006601191  0.480168847 0.0362912
## session7-session10   0.101896239 -0.145478648  0.349271127 0.9942189
## session8-session10   0.048626398 -0.199380453  0.296633249 0.9999998
## session9-session10   0.131594140 -0.088793774  0.351982054 0.8256668
## session12-session11 -0.114172687 -0.354670277  0.126324904 0.9741127
## session13-session11  0.002690058 -0.245717547  0.251097664 1.0000000
## session14-session11 -0.202583573 -0.458769649  0.053602503 0.3390730
## session15-session11 -0.060940305 -0.291687660  0.169807051 0.9999826
## session16-session11 -0.154928989 -0.408018518  0.098160540 0.7946190
## session17-session11  0.070071011 -0.199853428  0.339995450 0.9999865
## session18-session11  0.020467836 -0.252460247  0.293395920 1.0000000
## session2-session11  -0.323711004 -0.584715800 -0.062706208 0.0020640
## session3-session11  -0.266081871 -0.534571804  0.002408061 0.0552629
## session4-session11  -0.257309942 -0.518918573  0.004298690 0.0599937
## session5-session11  -0.267808629 -0.527922956 -0.007694303 0.0355721
## session6-session11  -0.107884654 -0.358563416  0.142794108 0.9906019
## session7-session11  -0.249373434 -0.510079390  0.011332523 0.0800783
## session8-session11  -0.302643275 -0.563948955 -0.041337595 0.0068000
## session9-session11  -0.219675533 -0.454928786  0.015577720 0.1009905
## session13-session12  0.116862745 -0.131886033  0.365611523 0.9766398
## session14-session12 -0.088410887 -0.344927790  0.168106017 0.9993320
## session15-session12  0.053232382 -0.177882218  0.284346982 0.9999977
## session16-session12 -0.040756303 -0.294180701  0.212668096 1.0000000
## session17-session12  0.184243697 -0.085994751  0.454482146 0.6235206
## session18-session12  0.134640523 -0.138598118  0.407879164 0.9630476
## session2-session12  -0.209538317 -0.470867840  0.051791206 0.3138046
## session3-session12  -0.151909185 -0.420714802  0.116896433 0.8823897
## session4-session12  -0.143137255 -0.405069864  0.118795355 0.9093842
## session5-session12  -0.153635943 -0.414076107  0.106804221 0.8395544
## session6-session12   0.006288032 -0.244728815  0.257304880 1.0000000
## session7-session12  -0.135200747 -0.396231802  0.125830308 0.9420759
## session8-session12  -0.188470588 -0.450100622  0.073159446 0.5191240
## session9-session12  -0.105502846 -0.341116320  0.130110628 0.9856631
## session14-session13 -0.205273632 -0.469220916  0.058673652 0.3702191
## session15-session13 -0.063630363 -0.302965286  0.175704560 0.9999809
## session16-session13 -0.157619048 -0.418561902  0.103323806 0.8116876
## session17-session13  0.067380952 -0.209920447  0.344682352 0.9999949
## session18-session13  0.017777778 -0.262448206  0.298003761 1.0000000
## session2-session13  -0.326401062 -0.595027879 -0.057774246 0.0030077
## session3-session13  -0.268771930 -0.544677180  0.007133320 0.0664502
## session4-session13  -0.260000000 -0.529213557  0.009213557 0.0726633
## session5-session13  -0.270498688 -0.538260384 -0.002736991 0.0445300
## session6-session13  -0.110574713 -0.369180020  0.148030595 0.9912445
## session7-session13  -0.252063492 -0.520399958  0.016272974 0.0955001
## session8-session13  -0.305333333 -0.574252507 -0.036414160 0.0092978
## session9-session13  -0.222365591 -0.466047672  0.021316489 0.1242836
## session15-session14  0.141643269 -0.105755544  0.389042082 0.8704220
## session16-session14  0.047654584 -0.220703647  0.316012816 1.0000000
## session17-session14  0.272654584 -0.011635819  0.556944988 0.0779821
## session18-session14  0.223051410 -0.064092403  0.510195222 0.3724462
## session2-session14  -0.121127431 -0.396963132  0.154708270 0.9883281
## session3-session14  -0.063498298 -0.346427042  0.219430446 0.9999984
## session4-session14  -0.054726368 -0.331133507  0.221680771 0.9999998
## session5-session14  -0.065225056 -0.340218316  0.209768205 0.9999964
## session6-session14   0.094698919 -0.171386917  0.360784756 0.9989953
## session7-session14  -0.046789860 -0.322342806  0.228763086 1.0000000
## session8-session14  -0.100059701 -0.376180126  0.176060723 0.9987406
## session9-session14  -0.017091960 -0.268698644  0.234514725 1.0000000
## session16-session15 -0.093988685 -0.338179546  0.150202177 0.9973723
## session17-session15  0.131011315 -0.130587748  0.392610379 0.9570873
## session18-session15  0.081408141 -0.183289055  0.346105337 0.9998523
## session2-session15  -0.262770699 -0.515156060 -0.010385338 0.0310054
## session3-session15  -0.205141567 -0.465260214  0.054977081 0.3440403
## session4-session15  -0.196369637 -0.449379405  0.056640131 0.3740559
## session5-session15  -0.206868325 -0.458332696  0.044596046 0.2690999
## session6-session15  -0.046944350 -0.288635700  0.194747001 0.9999998
## session7-session15  -0.188433129 -0.440509432  0.063643174 0.4466677
## session8-session15  -0.241702970 -0.494399479  0.010993538 0.0801086
## session9-session15  -0.158735228 -0.384387527  0.066917071 0.5650278
## session17-session16  0.225000000 -0.056503175  0.506503175 0.3195121
## session18-session16  0.175396825 -0.108987727  0.459781377 0.7845973
## session2-session16  -0.168782015 -0.441744170  0.104180140 0.7811185
## session3-session16  -0.111152882 -0.391280850  0.168975086 0.9962369
## session4-session16  -0.102380952 -0.375920548  0.171158644 0.9981247
## session5-session16  -0.112879640 -0.384990459  0.159231179 0.9937400
## session6-session16   0.047044335 -0.216061493  0.310150163 1.0000000
## session7-session16  -0.094444444 -0.367120865  0.178231976 0.9992884
## session8-session16  -0.147714286 -0.420964159  0.125535588 0.9169600
## session9-session16  -0.064746544 -0.313199613  0.183706526 0.9999857
## session18-session17 -0.049603175 -0.349068476  0.249862126 1.0000000
## session2-session17  -0.393782015 -0.682422329 -0.105141700 0.0002832
## session3-session17  -0.336152882 -0.631578967 -0.040726797 0.0089939
## session4-session17  -0.327380952 -0.616567404 -0.038194501 0.0097265
## session5-session17  -0.337879640 -0.625714994 -0.050044287 0.0054582
## session6-session17  -0.177955665 -0.457293399  0.101382069 0.7378506
## session7-session17  -0.319444444 -0.607814560 -0.031074329 0.0134410
## session8-session17  -0.372714286 -0.661626706 -0.083801866 0.0009538
## session9-session17  -0.289746544 -0.555328585 -0.024164502 0.0167063
## session2-session18  -0.344178840 -0.635629980 -0.052727700 0.0049381
## session3-session18  -0.286549708 -0.584722656  0.011623241 0.0764049
## session4-session18  -0.277777778 -0.569769797  0.014214242 0.0845257
## session5-session18  -0.288276465 -0.578930429  0.002377498 0.0547817
## session6-session18  -0.128352490 -0.410593709  0.153888728 0.9831727
## session7-session18  -0.269841270 -0.561024819  0.021342279 0.1083248
## session8-session18  -0.323111111 -0.614831734 -0.031390488 0.0134681
## session9-session18  -0.240143369 -0.508777612  0.028490874 0.1478815
## session3-session2    0.057629133 -0.229670140  0.344928405 0.9999997
## session4-session2    0.066401062 -0.214478101  0.347280225 0.9999966
## session5-session2    0.055902375 -0.223585534  0.335390284 0.9999997
## session6-session2    0.215826350 -0.054902057  0.486554756 0.3242220
## session7-session2    0.074337570 -0.205701041  0.354376182 0.9999813
## session8-session2    0.021067729 -0.259529290  0.301664748 1.0000000
## session9-session2    0.104035471 -0.152475974  0.360546916 0.9951531
## session4-session3    0.008771930 -0.279076024  0.296619884 1.0000000
## session5-session3   -0.001726758 -0.288217302  0.284763786 1.0000000
## session6-session3    0.158197217 -0.119754596  0.436149031 0.8759689
## session7-session3    0.016708438 -0.270319373  0.303736249 1.0000000
## session8-session3   -0.036561404 -0.324134050  0.251011243 1.0000000
## session9-session3    0.046406338 -0.217717613  0.310530290 1.0000000
## session5-session4   -0.010498688 -0.290550582  0.269553207 1.0000000
## session6-session4    0.149425287 -0.121885314  0.420735889 0.9035822
## session7-session4    0.007936508 -0.272664982  0.288537998 1.0000000
## session8-session4   -0.045333333 -0.326492113  0.235825446 1.0000000
## session9-session4    0.037634409 -0.219491424  0.294760241 1.0000000
## session6-session5    0.159923975 -0.109946049  0.429793999 0.8344405
## session7-session5    0.018435196 -0.260773657  0.297644048 1.0000000
## session8-session5   -0.034834646 -0.314603562  0.244934270 1.0000000
## session9-session5    0.048133096 -0.207472226  0.303738419 0.9999999
## session7-session6   -0.141488779 -0.411929091  0.128951533 0.9368258
## session8-session6   -0.194758621 -0.465777117  0.076259875 0.5238688
## session9-session6   -0.111790879 -0.357787748  0.134205991 0.9832945
## session8-session7   -0.053269841 -0.333588908  0.227049225 0.9999999
## session9-session7    0.029697901 -0.226509464  0.285905265 1.0000000
## session9-session8    0.082967742 -0.173849852  0.339785336 0.9997134
  1. The terms homogeneity and heterogeneity typically describe how feedback differs from meeting to meeting. When feedback values are homogeneous, they are consistent from meeting to meeting, and when they are heterogeneous, they differ significantly.

  2. Anova examines if the assumed variance is homogeneous, therefore the variability of feedback in each meeting ought to be rather consistent. The validity of the ANOVA results may be impacted if this supposition is broken (for instance, if some sessions exhibit higher variability than others).

  3. I used the Tukey test (taught in sta-106), which analyzes variation in data. While Tukey tests are intended to identify which specific groups have different means, ANOVA is used to compare the means of more than two groups. With a p-value of less than 0.001 (2.73e-10), the first table demonstrates a significant difference between the groups for the variable “mice”. With a p-value of less than 0.001 (9.66e-15), the second table demonstrates that the variable “session” also differs significantly between groups. Different sessions are compared in the Tukey Multiple Means comparison table that follows. The p-value that has been adjusted for multiple tests (“p adj”), the lower and upper bounds of the confidence interval (“lwr” and “upr”), and the mean of each comparison (“diff”) are all different. As an illustration, session 11 and session 1 differ significantly (p value = 0.0115487), as do session 13 and session 1 (p value = 0.0134066). Many of the comparisons, however, lacked statistical significance (p > 0.05).

Section 3. Predictive modeling

Section 4. Prediction performance on the test sets

setwd("/Users/yoursflo/Downloads/test")
test1 <- readRDS("test1.rds")
test2 <- readRDS("test2.rds")
test_all<-list()
for(i in 1:2){
  test_all[[i]]<-assign(paste0("test",i),f(i))
}
test_all<-Reduce(bind_rows,test_all)
test_all<-bind_rows(session_all[1,],test_all)
test_all<-test_all[-1,]
test_all[is.na(test_all)]<-0
predict_out <- predict(model3, newdata = test_all, type = "response")

###Prediction evaluation
threshold <- 0.5  
predicted_labels <- ifelse(predict_out >= threshold, "Positive", "Negative")
# Create matrix
confusion_matrix <- table(predicted_labels, test_all$feedback)
print(confusion_matrix)
##                 
## predicted_labels  -1   1
##         Negative  39  27
##         Positive  98 201
# Correct rate of calculation
accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix)
recall <- confusion_matrix["Positive",1] / sum(confusion_matrix["Positive", ])
precision <- confusion_matrix["Positive",1] / sum(confusion_matrix[,1])
f1_score <- 2 * (precision * recall) / (precision + recall)

print(paste("accuracy:", accuracy))
## [1] "accuracy: 0.657534246575342"
print(paste("recall:", recall))
## [1] "recall: 0.327759197324415"
print(paste("precision:", precision))
## [1] "precision: 0.715328467153285"
print(paste("F1 value:", f1_score))
## [1] "F1 value: 0.44954128440367"
library(ggplot2)

plot_data <- data.frame(
  Predicted = factor(predicted_labels, levels = c("Negative", "Positive")),
  Actual = factor(test_all$feedback, levels = c("-1", "1"))
)

ggplot(plot_data, aes(x = Predicted, fill = Actual )) +
  geom_bar(position = "fill") +
  labs(x = "Predicted", y = "Proportion") +
  scale_fill_manual(values = c("#999999", "#E69F00"), labels = c("Negative", "Positive")) +
  ggtitle("Confusion Matrix")

Section 5. Discussion

-65.75% accuracy: 0.657534246575342

-Recall rate: 32.78% (0.327759197324415).

-Precision: 71.53%, or 0.715328467153285

-F 1 score: 0.44954128440367 (44.95%)

Using a logistic regression model (Model 3), these measures were assessed for the test_all dataset’s prediction performance. The F1 score is the harmonic average of accuracy and recall. Accuracy assesses the accuracy of positive predictions, recall indicates the ability to properly identify positive examples, and recall is the proportion of right predictions. The model has a modest accuracy but a low recall rate, according to the data. A higher capacity to accurately forecast positive cases is indicated by the relatively high accuracy. The F1 results strike a balance between recall and accuracy.

#Acknowledgement

In the part of exploratory experiment analysis, the part of Project Consulting in class is used for reference to carry out exploratory analysis. And asked chatgpt about big data, where I could do more intuitive exploratory analysis.

In the section on data integration and discussion of homogeneity and heterogeneity of data, I refer to the previous sta106 work. This section mainly helps me to set up Tukey tests between data. In the model filtering section, the code was modified with the help of chatgpt and the data was continuously screened out.